A Novel Fuzzy based Clustering Algorithm for Text Classification
نویسندگان
چکیده
Due to the flourish of World Wide Web and the rapid development of the Internet technology, the increasing volume of digital textual data become more and more unmanageable, therefore the importance of text classification has gained significant attention. Text classification pose some specific challenges such as high dimensionality with each document (data point) having only a very small subset of them and representing multiple labels at the same time. Feature clustering is a technique which is more powerful that can reduce the dimensionality of a feature vector drastically. Many researchers worked on Feature Clustering for efficient text classification. Recently a Fuzzy based feature clustering was proposed in which Gaussian distribution is used for fuzzy membership function for clustering. But the problem of skewness may occur with this distribution for some type of data. To overcome that we propose a novel Fuzzy similarity based membership function for efficient clustering and with this proposed algorithm satisfactory results obtained. General Terms Feature Clustering, Skewness, Split distribution.
منابع مشابه
Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملA Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset
Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...
متن کاملOil Reservoirs Classification Using Fuzzy Clustering (RESEARCH NOTE)
Enhanced Oil Recovery (EOR) is a well-known method to increase oil production from oil reservoirs. Applying EOR to a new reservoir is a costly and time consuming process. Incorporating available knowledge of oil reservoirs in the EOR process eliminates these costs and saves operational time and work. This work presents a universal method to apply EOR to reservoirs based on the available data by...
متن کاملON FUZZY NEIGHBORHOOD BASED CLUSTERING ALGORITHM WITH LOW COMPLEXITY
The main purpose of this paper is to achieve improvement in thespeed of Fuzzy Joint Points (FJP) algorithm. Since FJP approach is a basisfor fuzzy neighborhood based clustering algorithms such as Noise-Robust FJP(NRFJP) and Fuzzy Neighborhood DBSCAN (FN-DBSCAN), improving FJPalgorithm would an important achievement in terms of these FJP-based meth-ods. Although FJP has many advantages such as r...
متن کامل